AutoBERT-Zero: Evolving BERT Backbone from Scratch

نویسندگان

چکیده

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural processing (NLP) tasks. However, the conventional paradigm constructs backbone by purely stacking manually designed global self-attention layers, introducing inductive bias thus leads to sub-optimal. In this work, we make first attempt automatically discover novel model (PLM) on a flexible search space containing most fundamental operations from scratch. Specifically, propose well-designed which (i) contains primitive math intra-layer level explore attention structures, (ii) leverages convolution blocks be supplementary for attentions inter-layer better learn local dependency. To enhance efficiency finding architectures, an Operation-Priority Neural Architecture Search (OP-NAS) algorithm, optimizes both algorithm evaluation of candidate models. (OP) evolution strategy facilitate via balancing exploration exploitation. Furthermore, design Bi-branch Weight-Sharing (BIWS) training fast evaluation. Extensive experiments show that searched architecture (named AutoBERT-Zero) significantly outperforms different capacities downstream tasks, proving architecture's transfer scaling abilities. Remarkably, AutoBERT-Zero-base RoBERTa-base (using much more data) BERT-large (with larger size) 2.4 1.4 higher score GLUE test set.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Literacy from Scratch

Literacy from Scratch” is a response to the United Kingdom (UK) government’s initiative to develop computer programming skills in both the Primary phase of education (pupils aged 5-11) and the Secondary phase (aged 11-18). The project has several related aspects: it involves the reworking of Primary and Secondary Initial Teacher Training (ITT) programmes at Brunel University, through which Post...

متن کامل

Text Understanding from Scratch

This article demonstrates that we can apply deep learning to text understanding from characterlevel inputs all the way up to abstract text concepts, using temporal convolutional networks(LeCun et al., 1998) (ConvNets). We apply ConvNets to various large-scale datasets, including ontology classification, sentiment analysis, and text categorization. We show that temporal ConvNets can achieve asto...

متن کامل

The Universe from Scratch

A fascinating and deep question about nature is what one would see if one could probe space and time at smaller and smaller distances. Already the 19th-century founders of modern geometry contemplated the possibility that a piece of empty space that looks completely smooth and structureless to the naked eye might have an intricate microstructure at a much smaller scale. Our vastly increased und...

متن کامل

Quality Estimation from Scratch

This thesis presents a deep neural network for word-level machine translation quality estimation. The model extends the feedforward multi-layer architecture by [Collobert et al., 2011] to learning continuous space representations for bilingual contexts from scratch. By means of stochastic gradient descent and backpropagation of errors, the model is trained for binary classification of translate...

متن کامل

Categories from scratch

The concept of category from mathematics happens to be useful to computer programmers in many ways. Unfortunately, all"good"explanations of categories so far have been designed by mathematicians, or at least theoreticians with a strong background in mathematics, and this makes categories especially inscrutable to external audiences. More specifically, the common explanatory route to approach ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i10.21311